Goto

Collaborating Authors

 linear model




Adaptive Linear Estimating Equations

Neural Information Processing Systems

Sequential data collection has emerged as a widely adopted technique for enhancing the efficiency of data gathering processes. Despite its advantages, such data collection mechanism often introduces complexities to the statistical inference procedure.




A Fast and Accurate Estimator for Large Scale Linear Model via Data Averaging

Neural Information Processing Systems

The asymptotic behavior of the proposed estimation procedure is studied. Our theoretical results show that the proposed method can achieve a faster convergence rate than the optimal convergence rate for sampling methods.



Knowledge Distillation in Wide Neural Networks: Risk Bound, Data Efficiency and Imperfect Teacher

Neural Information Processing Systems

On the other hand, recent finding on neural tangent kernel enables us to approximate a wide neural network with a linear model of the network's random features. In this paper, we theoretically analyze the knowledge distillation of a wide neural network. First we provide a transfer risk bound for the linearized model of the network. Then we propose a metric of the task's training difficulty, called data inefficiency.


A Flexible Empirical Bayes Approach to Generalized Linear Models, with Applications to Sparse Logistic Regression

Xie, Dongyue, Zhu, Wanrong, Stephens, Matthew

arXiv.org Machine Learning

We introduce a flexible empirical Bayes approach for fitting Bayesian generalized linear models. Specifically, we adopt a novel mean-field variational inference (VI) method and the prior is estimated within the VI algorithm, making the method tuning-free. Unlike traditional VI methods that optimize the posterior density function, our approach directly optimizes the posterior mean and prior parameters. This formulation reduces the number of parameters to optimize and enables the use of scalable algorithms such as L-BFGS and stochastic gradient descent. Furthermore, our method automatically determines the optimal posterior based on the prior and likelihood, distinguishing it from existing VI methods that often assume a Gaussian variational. Our approach represents a unified framework applicable to a wide range of exponential family distributions, removing the need to develop unique VI methods for each combination of likelihood and prior distributions. We apply the framework to solve sparse logistic regression and demonstrate the superior predictive performance of our method in extensive numerical studies, by comparing it to prevalent sparse logistic regression approaches.


Long-Term Probabilistic Forecast of Vegetation Conditions Using Climate Attributes in the Four Corners Region

McPhillips, Erika, Lee, Hyeongseong, Xie, Xiangyu, Baylis, Kathy, Funk, Chris, Gu, Mengyang

arXiv.org Machine Learning

Weather conditions can drastically alter the state of crops and rangelands, and in turn, impact the incomes and food security of individuals worldwide. Satellite-based remote sensing offers an effective way to monitor vegetation and climate variables on regional and global scales. The annual peak Normalized Difference Vegetation Index (NDVI), derived from satellite observations, is closely associated with crop development, rangeland biomass, and vegetation growth. Although various machine learning methods have been developed to forecast NDVI over short time ranges, such as one-month-ahead predictions, long-term forecasting approaches, such as one-year-ahead predictions of vegetation conditions, are not yet available. To fill this gap, we develop a two-phase machine learning model to forecast the one-year-ahead peak NDVI over high-resolution grids, using the Four Corners region of the Southwestern United States as a testbed. In phase one, we identify informative climate attributes, including precipitation and maximum vapor pressure deficit, and develop the generalized parallel Gaussian process that captures the relationship between climate attributes and NDVI. In phase two, we forecast these climate attributes using historical data at least one year before the NDVI prediction month, which then serve as inputs to forecast the peak NDVI at each spatial grid. We developed open-source tools that outperform alternative methods for both gross NDVI and grid-based NDVI one-year forecasts, providing information that can help farmers and ranchers make actionable plans a year in advance.